Information storage and retrieval system

ABSTRACT

Conventional information storage systems are subject to numerous practical constraints such as contiguity in the physical locations of blocks and the requirement that storage blocks be created in advance. Information retrieval in these systems has required the creation of indices, which take a long time to generate, and the structure of these systems makes them prone to deadlock since the indices are updated and the range of exclusion broadened when the referent information is modified. This invention utilizes the random access facilities of semiconductors to achieve high speeds and minimize the maintenance load. This invention introduces location tables and alternate-key tables to replace these indices. It also stores multiple records in a single block and can handle variable-length records and spanned records. The location tables manage the storage blocks. An alternate-key block is made up of a substitute key and its block number and the primary key value, either of which may be used to retrieve a target record by searching this table. Binary search is a well-known high-speed method of querying tables, but other methods may be used as well.

BACKGROUND OF THE INVENTION

[0001] 1. Field of the Invention

[0002] The invention relates to the field of information storage andretrieval, and particularly to the storage and reading of data withcomputers and also particularly to data management in that it providesdata storage and retrieval at high speeds, high ratios of data tostorage capacity and greatly reduced maintenance time.

[0003] 2. Description of Related Art

[0004] Conventional information storage systems are subject to numerouspractical constraints such as contiguity in the physical location ofblocks and the requirement that storage blocks be created in advance.Moreover, in most systems random access requires the creation ofindices, which take a long time to generate, and the structure of thesesystems makes them prone to deadlock since the indices are updated andthe range of exclusion broadened when the referent information ismodified. Although direct access systems do not use indices, thesesystems relate record keys with storage locations by means of specialprograms known as randomizing routines. In practice they do not allowfor sequential access and exhibit low storage efficiency when comparedwith methods that use indices.

SUMMARY OF THE INVENTION

[0005] Since conventional information storage and retrieval systemspresumed their data would be stored on magnetic disk, in certainrespects their shortcomings are unavoidable. On the other hand, theprice of semiconductors has fallen remarkably and conditions for usingsemiconductors as memory devices are falling into place. Semiconductorsdo not necessitate physical movement or rotation and enable high-speedstorage and reading even if addresses are not contiguous. Theconstruction of a memory device taking advantage of these properties andcomprising semiconductors alone or semiconductors and random accessmemory media would enable high-speed storage and retrieval processing. Asemiconductor may be used as the primary memory device or may beprovided to the invention as an external memory device. Storinginformation sequentially in blocks, using overflow blocks for storagewhen the insertion of information causes an overflow, using locationtables or alternate-key blocks that manage storage blocks rather thanusing indices, and performing retrievals from these location tables oralternate-key blocks enables high-speed storage and reading, improvesthe efficiency of information storage and minimizes the occurrence ofdeadlock.

[0006] Location tables and alternate-key tables are introduced in placeof indices. Records are not stored independently; rather, multiplerecords are stored in a single block. FIG. 4 illustrates the structureof a block. The FROM and TO segments in FIG. 4 represent the minimum keyvalue and maximum key value of the block, respectively, but neither isabsolutely necessary and the invention may be implemented with eitherone alone. When using long records, a single record may be matched to asingle block, and a format in which a single record is stored over twoor more blocks (spanned records) is also possible. All primary blocks ina single file and all overflow blocks in a single file are of equallengths, facilitating region management. Storage in blocks allows blockdetection by means of fixed-length operations even when variable-lengthrecords are stored because the block units are of a fixed length. Thelocation table secures the required contiguous region in advance. Asingle record (entry) in a location table manages a single primary blockin the data record storage region. Although it is also possible toimplement methods of managing multiple primary blocks, since primaryblocks may be of any given size it is simpler not to manage multipleblocks if the implementation allows modification of the size of blocks.Blocks are classified into primary blocks and overflow blocks, and onlythe primary blocks are managed in the location table. Records are storedfirst of all in primary blocks. If storage in a block is renderedimpossible due to insertion of a record, a single overflow block isallocated to that block. If that overflow block is not sufficient forstorage, one more overflow block is allocated. Overflow blocks aremanaged as dependent blocks of their primary block; they are not managedin the location table but only by pointing from the primary block.

[0007] Since overflow blocks are not managed in the location table,records are not inserted into the location table, the time it takes tore-write the location table is minimized and re-writing of the locationtable is a single record. Therefore, the range of any exclusion that mayoccur is radically minimized and the possibility of a deadlock isgreatly reduced.

[0008] A deadlock occurs when two different tasks on a single computerperform exclusion operations on two or more identical resources indifferent sequences but does not occur if the sequence of exclusion isidentical, and narrower ranges of exclusion reduce the probability ofdeadlock. Conventional indices are made up of multiple levels. When arecord is accessed and a record's key is modified when an index isupdated, the index is also modified. When this affects the lowest-levelindex alone, the exclusion range is restricted, but in some cases thismay affect higher-level indices. Since exclusion affects a range ofmultiple records and it takes a great deal of time to update the indicesin such cases, exclusion takes up long periods of time and is a cause ofmultiple instances of deadlock.

OBJECTS AND ADVANTAGES

[0009] Accordingly, several objects and advantages of the invention are:

[0010] (a) To provide high-speed random access by using primary keys andalternate keys;

[0011] (b) To provide high-speed sequential read and write operations;

[0012] (c) To provide minimal modification caused by updating,modification and deletion of data, short processing time, and high-speedstorage by using location tables and alternate-key blocks instead ofindices;

[0013] (d) To provide creation (generation) of location tables andalternate-key blocks in shorter times than are required for the creationof indices;

[0014] (e) To eliminate the necessity of creating blocks in advance;

[0015] (f) To provide the ability to add blocks until no physical spaceremains on a storage medium;

[0016] (g) To provide record-compression capabilities deriving from thecapability of handling variable-length records and record insertion(though in most cases compressed records cannot be updated);

[0017] (h) To provide the capability of using multiple alternate keys;and

[0018] (i) To provide greatly reduced database maintenance time.

BRIEF DESCRIPTION OF THE DRAWINGS

[0019]FIG. 1 illustrates the structure of a location table and a finalpointer, either or both of which may be implemented with differentlengths, the number of bytes given for each being for illustrativepurposes only.

[0020]FIG. 2 illustrates the relationship between a location table andits blocks.

[0021] FIG.3 illustrates the structures of an alternate table (and itsentries) and an alternate-key block, either or both of which may beimplemented with different lengths, the number of bytes given for eachbeing for illustrative purposes only, in which the FROM and TO segmentshold the minimum and maximum key values, respectively, of the block andthe alternate-key block may hold both FROM and TO values or it may holdonly one of these.

[0022]FIG. 4 illustrates the structure of a block.

[0023]FIG. 5 illustrates the structure of a pre-alternate-key block.

[0024]FIG. 6 illustrates the relationship between primary blocks andoverflow blocks in which the blocks occupy a contiguous region in orderto facilitate understanding, but there being no requirement that they doso in practice.

[0025]FIG. 7 illustrates an example of keys applied to text data.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

[0026] Records stored with this system are restricted to records havinga single unique primary key and none or one or more non-unique (thoughthese may be unique) keys (alternate keys). The system does notaccommodate records that do not have a unique key. However, a serialnumber or other unique key may be forcibly applied to a record nothaving a unique key when such a record is added and reading performedonly in the physical order of records or in the order of their alternatekeys. Since insertion is not thus involved, an overflow block isrequired only when updating a record increases the length of the record.

[0027] In the description following, “addition” refers to storage of arecord having a primary key larger than the largest primary key ofrecords currently stored, and “insertion” refers to storage of a recordhaving a primary key smaller than the largest primary key of recordscurrently stored.

[0028] First, we will describe the storage system. The size of alocation table is calculated from the number of records planned to bestored, the size of their blocks, and the number of primary-key blocksper record in the location table, and a contiguous region sufficient forthis size is secured in a storage medium. Similarly, the size and numberof alternate-key blocks is determined and a contiguous region sufficientfor this size secured so as to enable storage of entries in the numberof records also storing alternate-key blocks. However, in the eventstorage exceeds the number originally assumed, there is a possibilitythat the contiguous region may fill up, making further storageimpossible. In such cases, an additional contiguous storage region issecured and an address-conversion table used to treat multiplecontiguous regions as though they were one contiguous region, thusallowing the system to accommodate situations in which the number ofrecords stored exceeds the number originally assumed. When there aremultiple alternate keys, a region is secured for each one. There is noneed for the alternate-key blocks corresponding to different alternatekeys to be contiguous. FIG. 2 illustrates the relationship between alocation table and its blocks.

[0029] When using a storage system split into sub-ranges, a locationtable is created for each sub-range in a size suited to the number ofrecords intended to be stored in the sub-range. Each location table mustbe in a contiguous region, but all location tables need not be in asingle contiguous region. An alternate-key block is set up not splitacross regions, but in a contiguous region of a size corresponding toall records.

[0030] When storing the first record, first a final pointer isreferenced in exclusion mode. The final pointer controls how far theblock and location table are used and so has the format illustrated inFIG. 1. Since there are no stored records to begin with, the first blockis registered in the final pointer as the final block. The block numberand primary-key value are registered in the final pointer. Next, thefirst primary block is secured in exclusion mode, and its physicaladdress and block number (0 in this case since numbering starts from 0)are registered. When a block is secured on disk, its entry includes thevalue of its primary key. Next, records are registered in that block.Then all exclusion is released.

[0031] To register the second record, first the final pointer isreferenced in exclusion mode and it is determined whether the primarykey is greater than the value of the key in the final pointer. First wewill describe how a record is added. Since the block number is 0, blocknumber 0 in the location table is referenced in exclusion mode, thephysical location of block 0 obtained and block 0 at that physicallocation read. If there is sufficient space in that block, the record isstored, the value of the primary key registered in the final pointer andall exclusion released.

[0032] Subsequent additions are performed in like fashion. Sometimes thespace in block 0 will not be sufficient to store an additional record.If block 0 lacks sufficient space when the above operation is performedand block 0 read on the addition of record number m, a single primaryblock (block number 1) is secured in exclusion mode. Record number m isthen stored in block 1. The second record in the location table is thenreferenced in exclusion mode, and the physical location of primary block1 registered in this record. All exclusion is then released. In this wayadditions are stored in the location that physically follows the lastrecord. When split across sub-ranges, the same operations are performedfor each sub-range.

[0033] Next we describe how a record is inserted. Assume that multiplelocation-table records, primary blocks and data records already exist.

[0034] It is first of all necessary to determine which block the recordinserted should be stored in.

[0035] This is done by searching the location table. A binary search isone example of a high-speed search method. Though a binary search methodis used here as an example, other methods may also be used to find atarget entry. The method used here is to find dichotomous points andcompare the value of the primary key of the record stored in that block(abbreviated below as “stored primary-key value”), including theoverflow block if the primary block that record points to has anoverflow block, with the value of the primary key of the record inserted(abbreviated below as “inserted primary-key value”). If the insertedprimary-key value is greater than the smallest of the stored primary-keyvalues of the block or blocks and smaller than the smallest of thestored primary-key values in the next block, it is stored in the firstof those two blocks. Otherwise, the sizes of the stored primary-keyvalues of that block and the insertion-block primary-key value are againcompared, dichotomous points are obtained in the former if the insertedprimary-key value is smaller and in the latter if the insertedprimary-key value is not smaller, and like operations performed toidentify the block to be used for storing the record.

[0036] If the location table is composed of multiple contiguous regions,a binary search cannot be performed as is, but using anaddress-conversion table to treat it as though it were a singlecontiguous region allows a binary search to be performed.

[0037] If the location table is split across sub-ranges, first acomparison is made with the value of either the initial or the finalprimary key of the location table in each sub-range to determine whichsub-range holds the target record. A binary search is then performed onthat sub-range in like manner as described above to identify the blockthat holds the target record.

[0038] If the record is to be stored in block number n, the followingshall apply. The location within the block where the record is to beinserted is identified. Since records are arranged in the order of theirprimary keys, the location of the insertion will be that locationimmediately prior to the record having a primary key larger than therecord being inserted. First, a check is performed for duplication ofthe primary keys. If there is any duplication of primary keys, an erroris output since the storage cannot be performed. If there is noduplication of primary keys, one or more records located behind therecord being inserted are moved rearwards to create a space exactlyequal to the size of the record being inserted. If the records thusmoved fit within the block, insertion is completed, but if those recordsdo not fit within the block, a single overflow block is provided, apointer to it provided from the primary block, and as many recordsstored in the overflow block as necessary. Then the record to beinserted and any that follow it are stored in the primary block. FIG. 6illustrates the logical relationship between primary blocks and overflowblocks.

[0039] If an overflow block already exists, it is to be preferred for itto be possible to store the record combining a primary block and itsoverflow block. There is also a possibility that only a part of anoverflow block is used and the region not used efficiently. In order toavoid such, a single overflow block may be provided for multiple primaryblocks. It is also possible to provide all overflow blocks in anidentical size smaller than that of their primary block. It is alsopossible use a single overflow block by pointing to it from multipleprimary blocks.

[0040] Another method that may be adopted is to store a record thatoverflows in an independent storage region and provide a pointer to itfrom its primary block. However, if this method results in a largenumber of overflow records, it entails the disadvantage of longretrieval times compared with a method using overflow blocks. Inpractice, the storage method selected should be that suited to thenumber of records that are to be generated.

[0041] Next, we describe storage and updating with alternate keys.Alternate-key tables are stored in an alternate-key block in the orderof their alternate keys. Entries in an alternate-key table consist of analternate key, the physical address of the block where the record ofthat key value is stored, and the primary key of the record of that keyvalue. The number of entries in an alternate-key table changes when itsrecords are added or updated, but there is a high possibility that anincrease in the number of entries will result in entry insertion and avery low possibility that such increase will result in entry addition.Therefore, the management methods used for primary keys are notappropriate. If there are already more stored records than the finalnumber planned, insertions can be processed efficiently by providing aspace of defined size when storing an entry in an alternate-key block,but if the initial number of records is less than the final number ofstored records planned, key insertions will result in a multipleoverflow alternate-key blocks. A pre-alternate-key block is used in suchcases.

[0042] A pre-alternate-key block has the same structure as analternate-key block, and the number of pre-alternate-key blocks is thesize that can accommodate entries in the number of alternate-key blocksdivided by the size of the pre-alternate-key blocks. When the number ofentries in a pre-alternate-key block becomes equal to the number ofalternate-key blocks, these entries are moved from the pre-alternate-keyblock to an alternate-key block. When such entries are moved, inprinciple a single entry is stored in any given alternate-key block, butentries having an identical alternate key are stored in the samealternate-key block. If the number of entries having an identicalalternate key is too large to store them in the correspondingalternate-key block, they are stored in an added alternate-key overflowblock.

[0043] For example, assume that the final number of stored recordsplanned is one million records. If 100 entries can be stored in a singlealternate-key block, 10,000 alternate-key blocks will be required. Theentries are stored in a pre-alternate-key block until there are 10,000of them, and when the number of entries reaches 10,000, the entries aretransferred to an alternate-key block.

[0044] If there is a large number of alternate-key blocks and a singlelevel of pre-alternate-key blocks is provided, there is a possibilitythat this will result in a large number of pre-alternate-key blocks, ahigh frequency of insertions and inefficient updating. In such cases,multiple levels of pre-alternate-key blocks are provided. Taking theexample described above, 10,000 entries will be stored in apre-alternate-key block, but this pre-alternate-key block is providedtwo levels since 100 entries can be stored in a single block, 100entries are managed in the first pre-alternate-key block and when thenumber of entries in that first pre-alternate-key block reaches 100,they are transferred to the second-level pre-alternate-key block.

[0045]FIG. 5 illustrates an example of such a transfer in which atwo-level pre-alternate-key block is provided.

[0046] Next, we will describe retrieval of records using primary keys.This operation is performed in the same fashion used to determine theinsertion location when a record is inserted. The example of retrievaldescribed here uses the same binary-search method used in the exampledescribed for insertion. First, dichotomous points are found in thelocation table, and the value of the primary key of the record stored inthat block (abbreviated below as “stored primary-key value”), includingthe overflow block if the primary block that record points to has anoverflow block, is compared with the value of the primary key of thetarget record (abbreviated below as “target primary-key value”). If thetarget primary-key value is greater than the smallest of the storedprimary-key values- and smaller than the smallest of the storedprimary-key values in the next block, either the target record exists inthat block or the record of that key value does not exist in the file.Since records in a block are arranged in the order of their primarykeys, searching the block can detect the target record or confirmwhether the record does not exist in the file. Otherwise, the sizes ofthe stored primary-key values of that block and the primary-key value ofthe target record are again compared, dichotomous points are obtained inthe former if the target primary-key value is smaller and in the latterif the target primary-key value is not smaller, and like operationsperformed to identify the block storing the record.

[0047] Next, we will describe retrieval using alternate keys. Retrievalwith alternate keys is performed by searching alternate-key blocks. Thebinary search method is typically used, but the discussion is omittedhere since it is described above with respect to retrieval using primarykeys. The alternate-key block including the target alternate key isidentified. Then the target alternate-key table in the alternate-keyblock is identified. The result is that, as when using primary keys,either the entry exists in that block or a record with thatalternate-key value does not exist in the file and the entry does notexist. If the alternate-key block has any alternate-key overflow blocks,all such blocks are searched.

[0048] If the alternate-key table (or entry) having the target key valueis identified, the physical block is accessed from the physical blocknumber in that entry and the record in that block identified thatmatches the primary-key value in the entry. And since alternate keys maybe non-unique, the next entry in the alternate-key block is examined. Ifit has an identical alternate-key value, the record corresponding tothat entry is also retrieved, and this operation repeated until no entryhas an identical key value.

[0049] Next, we will describe generation. Generation may be useful in avariety of situations, among them when multiple records already exist inthis system's file and regeneration is required for such reason as anincreased number of overflow blocks, when restoring information from aback-up medium to the medium on which this system has been implemented,and when moving records stored by a method other than that describedhere to this system. Any of these may be effected by the same means.

[0050] For regeneration, files are read in the order of their primarykeys with a sequential access method and a sequential file is created. Asequential file is similarly created for generation.

[0051] Next, the location tables and alternate-key block are to beprepared. The number of location table entries is obtained by dividingthe number of records planned to be stored by the number of records thatcan be stored in a single block, and that amount of space is secured ina contiguous region. Alternate-key blocks are secured for each type ofalternate key. All alternate-key blocks for a given type of alternatekey have the same size, and the number of such blocks is determinedthus: The number of entries (A) that can be stored in a singlealternate-key block is obtained, and the number of records planned to bestored is divided by A to obtain the number of alternate-key blocks.Pre-alternate-key blocks are also to be prepared.

[0052] Records are stored in a block in-the order of their primary keys.The frequency of insertions may be estimated in advance or calculatedfrom statistical data, and a certain proportion of empty space providedwithin a given block. This also allows for instantaneous storage ofrecords. This is determined by how often insertions are to be performed.The proportion of empty space provided within a block may vary fromblock to block. When a record is stored, an entry is written to thelocation table. If a record has an alternate key, first an alternate-keyentry is created from the record and stored in a sequential file. Whenalternate-key entries have been created for all records, the entries aresorted in the order of the alternate keys and the sorted entries storedin the alternate-key block. The number of records generated is dividedby the maximum planned number of records, and storage is performed inthe alternate-key block in the proportion corresponding to thatquotient. This is because the alternate keys are inserted rather thanadded.

[0053] An alternate key may also be put to special use. Since keys arecurrently assigned to fields stored in a specific location in a record,they have an identical length and format, such as a product code or acustomer code. Since keys and records may be made relational in thissystem, keys may be created for text data. Keys may also be created forrecords of nonspecific format, as illustrated in FIG. 7. As in thisexample, alternate keys may be assigned to records not having fields ofuniform location or fixed field lengths.

What is claimed is:
 1. An information storage system for computers,comprising: a multitude of blocks of fixed length for storing amultitude of records, each having a unique key (a key that does notduplicate the key value of another record, hereafter called the “primarykey”), having zero or one or more non-unique keys (keys that mayduplicate the key values of different records, hereafter called“alternate keys”), and that are stored in said blocks in the order oftheir respective primary keys; structure such that said blocks consistof primary blocks and overflow blocks, said records to be stored firstof all in said primary blocks; a facility for allocating an overflowblock in the event an inserted record can not be stored in a primaryblock and for allocating further overflow blocks in the event aninserted record can not be stored in a single overflow block, the recordthen to be stored serially across said overflow blocks; a facility forallocating a new primary block in the event an added record can not bestored in the final primary block, the record then to be stored in saidnew primary block; a location table, used to manage the locations ofsaid primary blocks such that blocks may be positioned with norestriction whatsoever on their physical locations; a facility to createblocks as they become required until physical data storage area is fullsuch that each block need not be created in advance; a facility topartition files consisting of multiple record insertions after multiplespecified primary keys into multiple sub-ranges at the location of thefile's insertion, this operation treated as the addition rather than theinsertion of records, thus preventing the generation of overflowrecords; and providing retention of said primary and overflow blockspartitioned across multiple computers.
 2. An information retrievalsystem, operating upon the information storage system of claim 1 andcomprising: a facility for using location tables and alternate tables,contiguous space for each of which is secured in advance in sizescorresponding to the number of records stored; a location tablecontaining the number of said location table (serial numbers beginningwith the number zero that are identical to the primary block numbers)and the physical address of the primary block as a single entry, havinga structure such that they are stored in the order of their locationtable numbers; a facility whereby retrieval by means of primary key isperformed by searching the location table, the target record soretrieved by identifying the block including the target key value andthen retrieving the record in that block; inclusion in individuallocation table entries of the primary key of the corresponding block inthe event a random-access memory media device is used to store records,block searches to be performed by means of the location table alone; afacility for creating alternate-key tables when alternate keys are used,said alternate-key tables to be stored in alternate-key blocks, eachcapable of storing multiple entries, and space for said alternate-keyblocks secured in advance in identical sizes and in the quantityrequired; a structure whereby each said entry is made up of the numberof the block in which an alternate key and the corresponding record arestored and the primary key of that corresponding record; a structurewhereby said entries are stored in the ascending order of theiralternate keys; a facility for treating multiple records existing withina single alternate key as a single entry; a structure whereby entrieshaving identical alternate keys are stored in the identicalalternate-key block; a facility for storing entries in an alternate-keyoverflow block added to said alternate-key block in the event an entryor entries can not be stored in an alternate-key block because a largenumber of entries have an identical alternate key or due to theinsertion of an alternate key; a facility for retrieval by means ofalternate keys such that the entry containing a target alternate-keyvalue is identified by searching said alternate-key block, obtainingfrom that entry the block number in which the record is stored, andretrieving the corresponding record within that block, and such thatentries stored in both alternate-key blocks and alternate-key overflowblocks are retrieved in the event said alternate-key blocks alone aresubjected to searching and an alternate-key overflow block exists forthe alternate-key block searched; a facility for creatingpre-alternate-key blocks of a structure identical to alternate-keyblocks in a quantity obtained by dividing the space capable of storingthe entries of the number of alternate-key blocks by the size of theblocks; a facility such that when there are extremely few data recordscompared to the final number of storage records and consequently a highprobability that addition of a further record may frequently result inthe insertion of an alternate key, additional records are not storedfirst in an alternate-key block corresponding to the final number ofrecords, but stored in pre-alternate-key blocks until the number ofrecords reaches the number of alternate-key blocks; a facility formoving entries from pre-alternate-key blocks to the alternate-key blocksat the point when the number of pre-alternate-key block entries becomesequal to the number of alternate-key blocks, and whereby a single entryis, in principle, stored in each alternate-key block when said entriesare moved, but entries having identical alternate keys are stored inidentical alternate-key blocks. a facility for storing entries in analternate-key overflow block added to an alternate-key block when saidalternate-key block can not hold said entries because a large number ofsaid entries have an identical alternate key; a facility for sequentialreading by means of a primary key performed by detecting the primaryphysical block address in the location table and sequentially readingthe record in that primary block and, if an overflow block or blocksexist, the record in said overflow block or blocks, said subsequentblock to be determined by obtaining the next entry in the location tableand obtaining the physical block from said entry; and a facility forsequential access by means of an alternate key that first reads thefirst record as described above and reads the next record by means ofsequential access of the record of the next entry in the alternate-keyblock.